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Abstract 

Background: Emergency department (ED) overcrowding has become a frequent topic of investigation. De- 
spite a significant body of research, there is no standard definition or measurement of ED crowding. Four 
quantitative scales for ED crowding have been proposed in the literature: the Real-time Emergency Anal- 
ysis of Demand Indicators (READI), the Emergency Department Work Index (EDWIN), the National Emer- 
gency Department Overcrowding Study (NEDOCS) scale, and the Emergency Department Crowding Scale 
(EDCS). These four scales have yet to be independently evaluated and compared. 

Objectives: The goals of this study were to formally compare four existing quantitative ED crowding scales 
by measuring their ability to detect instances of perceived ED crowding and to determine whether any of 
these scales provide a generalizable solution for measuring ED crowding. 

Methods: Data were collected at two-hour intervals over 135 consecutive sampling instances. Physician 
and nurse agreement was assessed using weighted k statistics. The crowding scales were compared via 
correlation statistics and their ability to predict perceived instances of ED crowding. Sensitivity, specificity, 
and positive predictive values were calculated at site-specific cut points and at the recommended thresh- 
olds. 

Results: All four of the crowding scales were significantly correlated, but their predictive abilities varied 
widely. NEDOCS had the highest area under the receiver operating characteristic curve (AROC) (0.92), 
while EDCS had the lowest (0.64). The recommended thresholds for the crowding scales were rarely ex- 
ceeded; therefore, the scales were adjusted to site-specific cut points. At a site-specific cut point of 37.19, 
NEDOCS had the highest sensitivity (0.81), specificity (0.87), and positive predictive value (0.62). 

Conclusions: At the study site, the suggested thresholds of the published crowding scales did not agree 
with providers' perceptions of ED crowding. Even after adjusting the scales to site-specific thresholds, a 
relatively low prevalence of ED crowding resulted in unacceptably low positive predictive values for 
each scale. These results indicate that these crowding scales lack scalability and do not perform as designed 
in EDs where crowding is not the norm. However, two of the crowding scales, EDWIN and NEDOCS, and 
one of the READI subscales, bed ratio, yielded good predictive power (AROC >0.80) of perceived ED 
crowding, suggesting that they could be used effectively after a period of site-specific calibration at EDs 
where crowding is a frequent occurrence. 
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Overcrowding, extended waiting times, and pa- 
tient care delays are common problems in emer- 
gency departments (EDs) nationwide. 1-7 ED 
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overcrowding has become a frequent topic of investiga- 
tion; a 2004 bibliography on the subject assembled by 
the American College of Emergency Physicians contains 
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76 references. 8 Despite a great deal of effort and a signif- 
icant body of research, there is no standard definition or 
measurement of ED crowding. In a review of the litera- 
ture, Hwang and Concato found 23 articles with distinct 
definitions of ED overcrowding. 9-32 The lack of standard 
measures for ED crowding makes it difficult to determine 
the causes and consequences of ED crowding across 
institutions. For this reason, several investigators have 
sought to develop quantitative scales to measure ED 
crowding. Four quantitative crowding scales have been 
proposed in the emergency medicine literature: the 
Real-time Emergency Analysis of Demand Indicators 
(READI), 14 ' 33 the Emergency Department Work Index 
(EDWIN), 34,35 the National Emergency Department 
Overcrowding Study (NEDOCS) scale, 35-37 and the 
Emergency Department Crowding Scale (EDCS). 38,39 

All of the investigators who have proposed a quantita- 
tive crowding scale have conducted one or more corre- 
sponding evaluation studies, 32-39 and in one case two of 
the quantitative crowding scales were compared at the 
same site. 34 While there is no criterion-standard mea- 
surement of ED crowding, the common mode of evalua- 
tion for each of these scales has been their ability to 
match ED clinicians' perceptions of crowding. While 
there has been some debate about the reliability of cli- 
nicians' perceptions, 33,40 this serves as a starting point 
and provides a common benchmark against which to 
compare the four scales. 

To the best of our knowledge, this study marks the first 
independent evaluation of these crowding scales and the 
first time all four have been compared together. The 
goals of this study were to formally compare four exist- 
ing quantitative ED crowding scales by measuring their 
ability to detect instances of perceived ED crowding 
and to determine whether any of these scales provide a 
generalizable solution for measuring ED crowding. 



METHODS 



Study Design 

This was a prospective study of four quantitative crowd- 
ing scales. No patients were contacted, and no patient- 
specific data were collected. The local institutional review 
board approved this study and waived the requirement 
for informed consent. 

Study Setting and Population 

This study took place at a 31-bed ED with approximately 
40,000 visits annually and the highest case mix index 
(a formulaic measure of acuity) in the state. The ED is 
attached to a 520-bed Level 1 trauma center affiliated 
with a university school of medicine. Interns from the 
disciplines of emergency medicine, internal medicine, 
and hospital-based transitional program rotate in the 
ED. The ED is affiliated with the local residency in emer- 
gency medicine. The ED has 56 hours of attending physi- 
cian coverage per day and is staffed by a single group of 
16 physicians. Nurse staffing is based on historical pat- 
terns of demand for emergency services. The number 
of nurses in the department ranges from eight in the 
late afternoon and evening to three during the early 
morning hours. 



Study Protocol 

The data needed to calculate each of the four scales were 
collected at two-hour intervals over 135 consecutive sam- 
pling instances (approximately 11 days). The majority of 
the data needed to compute the various scales was auto- 
matically collected by the department's electronic patient 
tracking system. However, two key data points could not 
be accurately obtained from the patient tracking system 
data. These were the number of patients on a ventilator 
and the number of nurses currently working in the ED. 
To collect these data points, a data collection tool was 
loaded on the computer assigned to the ED clerk. The 
data collection tool was automatically displayed as a 
pop-up window at the predetermined sampling times, 
and the clerk entered the number of ventilated patients 
and the number of nurses manually. 

At corresponding sampling instances, one attending 
physician and one nurse were surveyed to assess their 
perceived level of ED crowding using a previously vali- 
dated single-question Likert-type instrument. 34 ' 39 This 
same survey instrument was used to assess clinicians' 
perceptions of ED crowding in studies involving EDWIN 
and EDCS. 34,40 It is also comparable to the 100-mm visual 
analog scale used to assess perceptions of ED crowding 
in the NEDOCS study. 41 The anonymous survey was 
taken by a pool of 12 charge nurses and a pool of 16 
attending physicians, including two of the investigators. 
The charge nurses are among the most experienced 
nurses working in the ED, and each of the 16 attending 
physicians had more than two years of experience work- 
ing at our ED. The physicians and the nurses were asked 
to respond to the following question: "How busy would 
you say the ED is right now? (Please take into account 
the workload of all the other doctors and nurses as well 
your own workload.)" The nurses and physicians had 
the following five response options: 1) not busy at all, 
not crowded; 2) steady, easily keeping up; 3) average: 
working hard, but keeping up; 4) more crowded and 
busy than desirable; 5) extremely busy, very crowded. 
The survey instrument was presented to the nurses in 
electronic format. As with the clerk's collection tool, 
the nurse's survey tool was loaded onto a workstation 
assigned to the ED charge nurse. The investigators 
met with several of the nurses and demonstrated how 
the survey tool functioned. To ensure that instructions 
were readily available, a single-sheet instruction form 
was posted next to the computer on the counter. The 
physicians requested a paper-based survey, and these 
survey forms were placed near the charge nurse's work- 
station. At the sampling times, the nurse taking the sur- 
vey gave one copy of the paper survey to an attending 
physician. The investigators emphasized that it was im- 
portant that the physicians and nurses not confer with 
one another when completing the surveys. 

Measurements 

Due to the lack of a criterion- standard measure for ED 
crowding, the results of the nurse and physician surveys 
served as the primary outcome of interest. The four 
crowding scales were assessed based on their ability to 
predict instances of perceived ED crowding as defined 
by a composite perception score (the average of the phy- 
sician and nurse survey scores). To maximize the number 
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of sampling instances that could be analyzed, any sam- 
pling instance that had a valid response from one or 
more clinicians was assigned a composite score. An in- 
stance of perceived ED crowding was defined as any 
sampling instance where the composite score was >3. 

READI. As the name implies, the READI scores are to be 
used for real-time assessment of crowding in an ED. The 
READI scores consist of three distinct indicators of ED 
crowding and one omnibus indicator. The first indicator, 
bed ratio (BR), is intended to quantify the relationship 
between the number of ED patients and the number of 
treatment spaces. A BR >1 indicates overcrowding. The 
second indicator is the acuity ratio (AR). The AR is simply 
the average acuity of the current ED population. The AR 
was based on a four-level acuity scale and is designed to 
measure the burden of illness currently faced by the ED. 
An AR ~ 1 indicates a low burden of illness, and an AR 
~4 indicates a severe burden of illness. Our institution 
uses a five-level triage scale comparable to the Emer- 
gency Severity Index, 42 so a series of transformations 
was necessary to resolve our scale with a four-level scale. 
The third indicator is the provider ratio (PR). The PR in- 
corporates the number of current ED patients, historical 
arrival rates, and historical measures of the physicians 7 
ability to move patients through the ED. A PR >1.5 indi- 
cates an understaffed ED. The demand value (DV) is an 
omnibus measure of ED crowding that incorporates the 
three aforementioned indicators. Empirical computer 
simulations by the investigators suggest that a DV >7 
indicates overcrowding. 14 ' 33 

In an evaluation of the READI scores, Reeder et al. 
found slight agreement between ED clinicians and the 
READI scores. This study also found comparably low 
levels of agreement between ED clinicians. 33 Although 
the results of this study indicate that READI scores do 
not agree with clinicians' perceptions of crowding, in or- 
der to be comprehensive, these measures were included 
in our independent evaluation. 

EDWIN. EDWIN is based on four data points: the num- 
ber of patients in the ED grouped by triage category 
(based on the Emergency Severity Index), the number 
of attending physicians, the number of treatment beds 
in the ED, and the number of admitted patients waiting 
for an inpatient bed. Like READI, EDWIN is intended 
for use in real-time analysis of ED workload. The EDWIN 
investigators suggest that ED activity may be demarcated 
into three zones: an active but manageable ED has an 
EDWIN score <1.5, a busy ED has an EDWIN score be- 
tween 1.5 and 2, and a crowded ED has an EDWIN score 
>2. An evaluation by Bernstein et al. showed that EDWIN 
was highly correlated with clinicians' perceptions of ED 
crowding. 34 This evaluation also demonstrated a strong 
association between EDWIN and periods of ambulance 
diversion. In addition to the evaluation by Bernstein, 
Weiss et al. found significant correlation between ED- 
WIN and clinicians 7 perceptions of ED overcrowding. 35 

NEDOCS. The NEDOCS scale was developed as part of a 
multicenter study and requires seven inputs: total num- 
ber of ED beds, number of inpatient hospital beds, total 
number of patients in the ED, total number of patients 



on a ventilator in the ED, longest current patient stay 
(in hours), total number of patients in the ED awaiting 
an inpatient bed, and the waiting time in hours for the 
last patient placed in an ED treatment bay. The NEDOCS 
investigators suggest that a NEDOCS score >100 indi- 
cates overcrowding. Multiple evaluation studies have 
demonstrated that NEDOCS is highly correlated with 
clinicians 7 perceptions of crowding, ambulance diver- 
sion, and patients leaving without being seen by a 
physician. 

EDCS. EDCS is also the product of a multisite crowding 
study, and it seeks to provide an objective measure of ED 
crowding based on a small set of easily accessible fac- 
tors. The specific inputs to the EDCS are number of 
attending emergency physicians, number of staffed ED 
beds, number of critical care patients, number of total 
ED patients, number of staffed hospital beds, and hospi- 
tal occupancy rate. The EDCS was significantly corre- 
lated with treatment times, boarding times, ambulance 
diversion, and the number who leave without being 
seen by a physician. An EDCS score >65 was found 
to be predictive of both ambulance diversion and the 
number of patients who leave without being seen by a 
physician. 

Data Analysis 

At each sampling instance, there were three potential 
levels of clinician response: complete, partial, or no re- 
sponse. A complete response was a sampling instance 
where both nurse and physician surveys were obtained, 
a partial response was a sampling instance where either 
a nurse or a physician survey was obtained, and no re- 
sponse was a sampling instance when neither a nurse 
nor a physician survey was obtained. To address the pos- 
sibility that different levels of ED crowding would induce 
response bias (i.e., clinicians would not take the time 
to respond to the survey when the ED was busy), the 
mean census, DV, EDWIN, NEDOCS, and EDCS for full 
response sampling instances and partial or no response 
sampling instances were compared via Student's t-tests. 
Physician and nurse interrater reliability was assessed 
using Cohen's weighted k with Flies s-Cohen weights; 
this calculation is based on the difference between ob- 
served agreement and the level of agreement due to ran- 
dom chance. The k statistic ranges from -1 to 1. Negative 
values indicate a systematic bias, a value of 0 is equiva- 
lent to random chance, and 1 indicates perfect agree- 
ment. A weighted k statistic is used when ratings are 
based on an ordinal scale (as with our Likert survey in- 
strument) and accounts for differing levels of disagree- 
ment. 43-47 Pairwise Spearman correlation coefficients 
(p) were computed to test the linear association of each 
of the major crowding scales (AR, BR, and PR were not 
included in this portion of the analysis). The Spearman 
correlation coefficient also ranges from -1 to 1. Values 
close to -1 indicate a strong inverse association, while 
values close to 1 indicate a strong positive association. 
Each of the crowding scales, including the READI sub- 
scales, was entered into single-variable logistic regres- 
sion models to assess its ability to predict perceived 
instances of overcrowding. The logistic regression 
analysis allowed for the assessment of the predictive 
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Table 1 

Frequency Table for Clinician Response 



Nurse Rating 



Physician Rating 


1 


2 


3 


4 


5 


1 


23 


6 


1 


0 


0 


2 


6 


18 


11 


1 


0 


3 


2 


7 


7 


3 


1 


4 


0 


0 


4 


8 


0 


5 


0 


1 


1 


3 


1 



The frequency distribution of physician and nurse responses to a five- 
point crowding instrument. A rating of 1 indicates that the ED is not at 
all crowded, while a rating of 5 indicates that the ED is extremely busy 
and very crowded. The bolded diagonal represents instances of exact 
agreement. 



discrimination of the different crowding scales via the 
generalized area under the receiver operating character- 
istic curve (AROC). 48 AROCs for each of the crowding 
scales were evaluated using a nonpar ametric technique 
that allows for the comparison of areas under correlated 
ROC curves. 49 All statistical analysis was performed 
using SAS (version 9.1; SAS Institute, Inc., Cary, NC). 

RESULTS 

There were 135 sampling instances over the 11-day 
study. All four of the quantitative crowding scales were 
calculated for each sampling instance. Nurse perceptions 
of crowding were captured at 117 (87%) of the sampling 
instances, and the physicians responded at 108 (80%) 
of the sampling instances. Both nurse and physician 
perceptions were obtained at 104 (77%) of the sampling 
instances, and at least one perception (either from the 
nurse or the physician) of crowding was captured at 
121 (90%) of the sampling instances. Student's t-tests 
showed that there were no statistically significant differ- 
ences between the mean census, DV, EDWIN, NEDCOS, 
and EDCS for sampling instances where a complete clini- 
cian response was received and for sampling instances 
where a partial or no response was received. The physi- 
cians rated the ED as being overcrowded (rating >3) at 
19 (18%) of the sampling instances, the nurses rated the 
ED as being overcrowded at 19 (16%) of the valid sam- 
pling instances, and the composite crowding score (the 
average of the nurse and physician ratings) rated the 
ED as being crowded at 26 (22%) of the valid sampling 



instances. Table 1 shows the frequency of physician 
and nurse responses to the five-point survey instrument 
and contains the agreement matrix for the sampling 
instances where both a nurse and physician response 
were obtained. The weighted k statistic, a measure of in- 
terrater agreement between the nurses and physicians, 
was 0.73 (95% confidence interval = 0.63 to 0.85). This 
indicates substantial agreement between the nurses and 
physicians and suggests that perception is a fairly reli- 
able outcome measure at our institution. 43 

Table 2 displays the summary statistics for all of the 
computed crowding scales. It is notable that for NE- 
DOCS and EDWIN, the suggested threshold for crowd- 
ing was never reached during our study period. 
Hypothesis tests showed that all of the scales were signif- 
icantly correlated (Table 3). NEDOCS and EDWIN had 
the highest correlation coefficient (p = 0.67, p < 0.01), 
and EDWIN and EDCS had the lowest correlation coeffi- 
cient (p = 0.26, p < 0.01). 

Table 4 contains the AROC and associated confidence 
intervals for each crowding scale. NEDOCS had the 
highest AROC (0.92), while PR, one of the READI sub- 
scales, had the lowest AROC (0.53). BR, another READI 
subscale, and EDWIN also performed relatively well in 
terms of the AROC with values of 0.86 and 0.84, respec- 
tively. Hypotheses testing showed that there was not a 
statistically significant difference (p = 0.12) between the 
ROC curves for EDWIN and NEDOCS and between DV 
and EDCS (p = 0.71). However, there were significant dif- 
ferences (p < 0.05) between NEDOCS (EDWIN) and DV 
(EDCS). These differences suggest that NEDOCS and 
EDWIN were significantly better predictors of perceived 
instances of ED overcrowding at our institution than DV 
and EDCS. 

As part of the AROC analysis, site-specific cut points 
were determined for each of the four scales based on a 
sensitivity of approximately 80%. Table 5 contains the 
computed sensitivities, specificities, and positive predic- 
tive values (PPVs) for each scale at the site-specific cut 
points. Also included in Table 5 are the sensitivities, spec- 
ificities, and PPVs for each scale at the thresholds pre- 
scribed by the original investigators. At a site-specific 
cut point of 37.19, NEDOCS had the highest sensitivity 
(0.81), specificity (0.87), and PPV (0.62). None of the 
crowding scales performed well at the published thresh- 
olds. The thresholds for NEDOCS and EDWIN were 
never reached, and for EDCS and DV the thresholds 



Table 2 

Summary Statistics for Each Crowding Scale 



Crowding Scale 


Median 


Interquartile Range 


Mean 


Range 


SD 


DV 


4.39 


3.25-5.55 


4.53 


0.40-10.91 


2.06 


BR 


0.60 


0.30-0.89 


0.64 


0.01-2.38 


0.39 


AR 


2.95 


2.79-3.00 


2.90 


0.75-3.40 


0.31 


PR 


0.85 


0.59-1.17 


0.92 


0.13-2.47 


0.45 


EDWIN 


0.49 


0.35-0.64 


0.49 


0-1.21 


0.20 


NEDOCS 


19.36 


4.51-41.34 


24.19 


-13.06 to 75.44 


23.24 


EDCS 


17.00 


6.00-27.00 


19.65 


1.00-79.00 


16.21 


For each crowding scale, n = 135. 

DV = demand value; BR = bed ratio; AR = 

Department Overcrowding Study; EDCS 


acuity ratio; PR = provider ratio; EDWIN = Emergency Department Work Index; NEDOCS = 
= Emergency Department Crowding Scale. 


National Emergency 
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Table 3 

Spearman Correlation Coefficients 







DV 






EDWIN 




NEDOCS 


EDCS 






P 




p- value 


P 


p- value 


P 


p- value 


P 


p- value 


DV 


1.00 




n/a 


0.49 


<0.01 


0.41 


<0.01 


0.50 


<0.01 


EDWIN 


0.49 




<0.01 


1.00 


n/a 


0.67 


<0.01 


0.26 


<0.01 


NEDOCS 


0.41 




<0.01 


0.67 


<0.01 


1.00 


n/a 


0.29 


<0.01 


EDCS 


0.50 




<0.01 


0.26 


<0.01 


0.29 


<0.01 


1.00 


n/a 


DV = demand value; EDWIN = Emergency Department Work Index; NEDOCS = 
gency Department Crowding Scale. 


National Emergency Department Overcrowding Study; EDCS 


= Emer- 



were rarely exceeded, producing very low levels of sensi- 
tivity (Table 5). 

DISCUSSION 

Emergency department crowding is a problem that is 
tied to perception and circumstance as much as it is 
grounded in reality. The circumstances that drive clini- 
cians' perceptions of crowding at one ED may have little 
influence at another. ED clinicians and systems will 
evolve to meet the challenges presented them. This con- 
stant process of change and adaptation makes develop- 
ing a universal metric using static parameters difficult, 
and our study confirms this. 

The administration of the survey to the physicians and 
nurses was not ideal and was chosen to maximize the 
probability of a valid response. The nurses and physi- 
cians were instructed to not consult each other while tak- 
ing the survey, and the interrater data do not show any 
evidence of collusion or bias. The nonresponse rates of 
emergency physicians and nurses (13% and 20%) were 
primarily due to technical problems with the automated 
survey tool. Statistical tests showed that there were 
not significant differences between sampling instances 
where a complete response was received and where a 
partial or no response was received. 

This comparison study took place at a 31-bed ED with 
40,000 annual visits that is associated with a 520-bed 
hospital. These circumstances are very different from 
those of many of the facilities where the crowding scales 
were derived. Fortunately, these differences serve to 
clearly illustrate some of our key findings and conclusions. 



Table 4 

AROC for Each Crowding Scale 



Crowding Scale 


AROC 95% Confidence Interval 


DV 


0.66 


0.55, 0.76 


BR 


0.86 


0.78, 0.93 


AR 


0.57 


0.44, 0.69 


PR 


0.53 


0.41, 0.66 


EDWIN 


0.84 


0.77, 0.93 


NEDOCS 


0.92 


0.85, 0.97 


EDCS 


0.64 


0.53, 0.75 


AROC = area under the receiver operating characteristic curve; DV = de- 
mand value; BR = bed ratio; AR = acuity ratio; PR = provider ratio; EDWIN = 
Emergency Department Work Index; NEDOCS = National Emergency 
Department Overcrowding Study; EDCS = Emergency Department 
Crowding Scale. 



Rarely did any of the crowding scales achieve their pub- 
lished threshold values. Had we limited our analysis to 
the use of these published thresholds, the majority of the 
perceived instances of ED crowding would have gone un- 
detected. However, subsequent AROC analysis did indi- 
cate that three of the scales (NEDOCS, EDWIN, and BR) 
provide good predictive power for instances of perceived 
ED crowding. For example, at our site, a NEDOCS value of 
37.19 was most indicative of a perceived episode of ED 
crowding (compared with the published value of 100). 
This supports the theory that there do appear to be under- 
lying determinants of ED crowding and that EDWIN 
and NEDOCS capture this construct. 35 However, it casts 
major doubt on the hope that any of the scales in their cur- 
rent form can provide a single turnkey solution. 

The relatively low prevalence of ED crowding at our 
site, when compared with the sites where these crowd- 
ing scales were developed, is an important difference 
and has significant effects on the predictive accuracy of 
these scales. The prevalence of perceived overcrowding 
was only 22% at our site during this study, as compared 
with 65% during the EDWIN and NEDOCS comparison 
studies. Even after deriving a site-specific threshold for 
NEDOCS, we were only able to achieve a PPV of 0.62. 
This translates into a "false- alarm" rate of 38%, and 
when informally surveyed, many clinicians believed this 
would negate the usefulness of NEDOCS as a decision 
support tool. This indicates another potential shortcom- 
ing of existing ED crowding scales: a lack of scalability. 
Before this study, validation of the crowding scales has 
taken place at EDs where crowding is perceived as a 
much greater problem. In such settings, they appear to 
perform relatively well. This study indicates that level of 
performance does not transfer to institutions where 
crowding is not the norm. However, at such institutions, 
the ability to detect, report, and engage a clear action 
plan may be even more important because informal sys- 
tems have not evolved to handle unexpected patient 
surge. 

The fact that none of the four scales performed as sug- 
gested and all failed to yield acceptable PPVs, even after 
being adjusted to our site, is due to the nature of their 
construction as much as it is to between-site variability. 
None of the crowding scales evaluated in this report 
are dimensionless. Instead, they assume that absolute 
changes in inputs have fixed, absolute impacts on the 
level of ED crowding. For instance, both the NEDOCS 
and EDCS assume each additional boarder increases 
the likelihood of ED crowding by a fixed parameter. 
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Table 5 

Sensitivity, Specificity, and Positive Predictive Value for Institution-Specific Cut Points and for Published Thresholds 



Crowding Scale 


Cut Point 


Sensitivity 


Specificity 


PPV 


Threshold 


Sensitivity 


Specificity 


PPV 


DV 


3.97 


0.80 


0.51 


0.31 


7.00 


0.12 


0.93 


0.30 


EDWIN 


0.54 


0.81 


0.76 


0.47 


2.00 


0.00 


1.00 


n/a 


NEDOCS 


37.19 


0.81 


0.87 


0.62 


100.00 


0.00 


1.00 


n/a 


EDCS 


10 


0.77 


0.35 


0.24 


65.00 


0.08 


1.00 


1.00 



PPV = positive predictive value; DV = demand value; EDWIN = Emergency Department Work Index; NEDOCS = National Emergency Department Over- 
crowding Study; EDCS = Emergency Department Crowding Scale. 



Similarly, boarding positively affects EDWIN by decreas- 
ing provider effectiveness (the denominator). None of the 
scales adjust to site-specific rates or averages. However, 
the metric to which they were validated, provider per- 
ceptions, does. 

The failure to adjust to site-specific baselines leads to 
an operational issue regarding the implementation of 
any of the current ED crowding scales. Before imple- 
menting any crowding scale, a significant period of cali- 
bration is likely required. This calibration can take one of 
two forms. One can, as was done here, take each scale's 
calculation as given and focus on the identification of 
threshold values. Alternatively, one can take the pub- 
lished thresholds as given and concentrate on re-esti- 
mation of the underlying parameter values. The first 
approach is the most straightforward and easy to imple- 
ment. The second has the advantage of maintaining a 
universal interpretation that makes between-site com- 
parisons possible. Such an approach has significant util- 
ity in terms of multisite intervention studies. In either 
case, our experience suggests that calibration is not lim- 
ited to simple number crunching. It may also entail re- 
configuring clinical information systems to capture all 
of the needed data points. 

Finally, as ED operations evolve, it is likely that capac- 
ity and crowding issues will also change. The implication 
with regard to measures of ED crowding is that whatever 
crowding scale is adopted, it will have to be periodically 
revisited and recalibrated to remain relevant. 

LIMITATIONS 

Data were collected over a relatively short period at a sin- 
gle institution, and there is no criterion standard for mea- 
suring ED crowding. The scope of our study was limited 
to identifying which of the existing ED crowding scales 
best correlate to the perceptions of ED clinicians at our 
institution. The lessons learned from testing the four 
quantitative scales at an independent institution should 
be instructive to other sites interested in implementing 
a quantitative crowding scale. However, as we have dem- 
onstrated, the effectiveness of these crowding scales 
varies with the prevalence of perceived crowding. There- 
fore, our results will not be generalizable at institutions 
where crowding is a more common occurrence. 

CONCLUSIONS 

This study marks the first independent evaluation of 
READI, EDWIN, NEDOCS, and EDCS. At our site, the 
suggested thresholds of these crowding scales did not 



agree with our providers 7 perceptions of ED crowding. 
Even after adjusting the scales to site-specific thresholds, 
a low prevalence of ED crowding resulted in unaccept- 
ably low PPVs for each scale. These results indicate that 
these crowding scales lack scalability and do not perform 
as designed in EDs where crowding is not the norm. 
However, two of the crowding scales, EDWIN and NE- 
DOCS, and one of the READI subscales, BR, yielded 
good predictive power (AROC >0.80) of perceived ED 
crowding, suggesting that they could be used effectively 
after a period of site-specific calibration at EDs where 
crowding is a frequent occurrence. 
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